Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Spark concat_ws function #8854

Closed
wants to merge 1 commit into from

Conversation

PHILO-HE
Copy link
Contributor

@PHILO-HE PHILO-HE commented Feb 26, 2024

Add concat_ws Spark function which returns the concatenation for the
input, separated by a separator (the first argument). It allows variable
number of VARCHAR or ARRAY<VARCHAR> arguments. And these two
types can be used in combination.

This function is a bit similar to ConcatFunction, except that concat_ws
requires separator and supports using ARRAY type and mixed types.

This PR is based on #6292 (author: @unigof). There are a few bug fixes
and improvements. Also made some changes to align with Spark.

Doc link.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 26, 2024
Copy link

netlify bot commented Feb 26, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 3bcce79
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6752a36cb48c4c0008b6a9a8

@PHILO-HE
Copy link
Contributor Author

Found Spark supports arguments with String/Array type mixed used. Needs to enhance this patch.

SELECT concat_ws(,, a, b, array(c, d), e);
"a,b,c,d,e"

@PHILO-HE PHILO-HE force-pushed the concat_ws branch 3 times, most recently from b2bca03 to 7efd217 Compare March 26, 2024 14:49
@PHILO-HE PHILO-HE force-pushed the concat_ws branch 9 times, most recently from 3033eb8 to a1f105b Compare April 10, 2024 02:58
@PHILO-HE
Copy link
Contributor Author

Found Spark supports arguments with String/Array type mixed used. Needs to enhance this patch.

SELECT concat_ws(,, a, b, array(c, d), e);
"a,b,c,d,e"

Supported such case in the latest code.

@PHILO-HE
Copy link
Contributor Author

@rui-mo, could you take a review?

@PHILO-HE PHILO-HE force-pushed the concat_ws branch 2 times, most recently from 379108a to 8203b08 Compare April 10, 2024 03:11
velox/docs/functions/spark/string.rst Outdated Show resolved Hide resolved
velox/docs/functions/spark/string.rst Outdated Show resolved Hide resolved
velox/docs/functions/spark/string.rst Outdated Show resolved Hide resolved
@PHILO-HE PHILO-HE force-pushed the concat_ws branch 2 times, most recently from c2638ed to bbfa05e Compare April 12, 2024 03:35
@PHILO-HE
Copy link
Contributor Author

@rui-mo, could you review again? Thanks!

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

velox/functions/sparksql/String.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/String.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/String.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/String.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

velox/functions/sparksql/String.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good to me overall % some minors.

"concat_ws",
std::vector<facebook::velox::exec::FunctionSignaturePtr>{
// Signature: concat_ws (separator, input,...) -> output:
// varchar, varchar, varchar,.. -> varchar
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to cover the case of mixed string and array?

Copy link
Contributor Author

@PHILO-HE PHILO-HE Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When array is enabled, the fuzzer test framework reports one issue when max_level_of_nesting=10. It seems not related to our implementation logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show the exception? Maybe we should fix it first.

Copy link
Contributor Author

@PHILO-HE PHILO-HE Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below error message. I run the test with --velox_fuzzer_enable_complex_types. Seems this option has not been enabled for spark fuzzer test? cc @rui-mo

    @ 0000000003cbcd71 folly::symbolizer::(anonymous namespace)::signalHandler(int, siginfo_t*, void*)
                       /root/PHILO/workspace/velox/deps-download/folly/folly/debugging/symbolizer/SignalHandler.cpp:453
    @ 000000000001441f (unknown)
    @ 0000000001f13a7a facebook::velox::exec::LocalSelectivityVector::~LocalSelectivityVector()
    @ 000000000129f0f6 facebook::velox::exec::Expr::evalSimplified(facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&) [clone .cold]
    @ 00000000038ca794 facebook::velox::exec::Expr::evalSimplifiedImpl(facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&)
    @ 00000000038cb077 facebook::velox::exec::Expr::evalSimplified(facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&)
    @ 00000000038ca794 facebook::velox::exec::Expr::evalSimplifiedImpl(facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&)
    @ 00000000038cb077 facebook::velox::exec::Expr::evalSimplified(facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&)
    @ 00000000038cb1bd facebook::velox::exec::ExprSetSimplified::eval(int, int, bool, facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&)
    @ 0000000001fd4743 facebook::velox::test::ExpressionVerifier::verify(std::vector<std::shared_ptr<facebook::velox::core::ITypedExpr const>, std::allocator<std::shared_ptr<facebook::velox::core::ITypedExpr const> > > const&, std::shared_ptr<facebook::velox::RowVector> const&, std::optional<facebook::velox::SelectivityVector> const&, std::shared_ptr<facebook::velox::BaseVector>&&, bool, facebook::velox::fuzzer::InputRowMetadata const&)
    @ 00000000013dd3a5 facebook::velox::fuzzer::ExpressionFuzzerVerifier::go()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PHILO-HE Would you open an issue so we can take a further look when enabling the velox_fuzzer_enable_complex_types option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rui-mo, just filed one issue: #11727.

velox/functions/sparksql/tests/ConcatWsTest.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/tests/ConcatWsTest.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/ConcatWs.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/ConcatWs.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/tests/ConcatWsTest.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/tests/ConcatWsTest.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/tests/ConcatWsTest.cpp Outdated Show resolved Hide resolved
@PHILO-HE PHILO-HE force-pushed the concat_ws branch 2 times, most recently from 9687831 to 3d3e184 Compare December 2, 2024 03:23
return totalResultBytes;
}

/// Initialize some vectors for inputs. And concatenate consecutive
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// for private function, not revel to users

@PHILO-HE PHILO-HE force-pushed the concat_ws branch 2 times, most recently from 16adc62 to c305598 Compare December 2, 2024 03:44
@rui-mo rui-mo changed the title feat: Add concat_ws Spark function feat: Add Spark concat_ws function Dec 2, 2024
@@ -253,6 +253,17 @@ static const std::unordered_map<
/// them to fuzzer instead of hard-coding signatures here.
getSignaturesForCast(),
},
{
"concat_ws",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe include a note indicating that this signature is only for Spark SQL. We will separate the Presto and Spark signatures in #9949. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rui-mo, just updated. Thanks!

@rui-mo rui-mo added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Dec 3, 2024
@@ -44,6 +44,9 @@ void registerSpecialFormGeneralFunctions(const std::string& prefix) {
"cast", std::make_unique<SparkCastCallToSpecialForm>());
registerFunctionCallToSpecialForm(
"try_cast", std::make_unique<SparkTryCastCallToSpecialForm>());
registerFunctionCallToSpecialForm(
ConcatWsCallToSpecialForm::kConcatWs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose concat_ws should be in RegisterString.cpp.

Copy link
Contributor Author

@PHILO-HE PHILO-HE Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yohahaha, seems all special form functions should be registered in RegisterSpecialForm.cpp?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yohahaha, just updated this pr according to your suggestion. Thanks!

@PHILO-HE
Copy link
Contributor Author

PHILO-HE commented Dec 4, 2024

@xiaoxmeng, could you merge this pr if you have no comment?

@PHILO-HE
Copy link
Contributor Author

PHILO-HE commented Dec 6, 2024

Friendly ping @xiaoxmeng. Could you merge this pr? The CI failure is not related.

@facebook-github-bot
Copy link
Contributor

@bikramSingh91 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@PHILO-HE
Copy link
Contributor Author

PHILO-HE commented Dec 9, 2024

@bikramSingh91, thanks for importing the pr! I note internal test is red. If it's not related to this pr, could you merge it?

@PHILO-HE
Copy link
Contributor Author

Friendly ping @bikramSingh91, could you merge this pr if no other change is required?

@PHILO-HE
Copy link
Contributor Author

Hi @kgpai, can you merge this pr if you have no comment?

@kgpai
Copy link
Contributor

kgpai commented Dec 13, 2024

@PHILO-HE Following up with @bikramSingh91

@facebook-github-bot
Copy link
Contributor

@bikramSingh91 merged this pull request in bddddf8.

athmaja-n pushed a commit to athmaja-n/velox that referenced this pull request Jan 10, 2025
Summary:
Add concat_ws Spark function which returns the concatenation for the
input, separated by a separator (the first argument). It allows variable
number of VARCHAR or ARRAY\<VARCHAR\> arguments. And these two
types can be used in combination.

This function is a bit similar to [ConcatFunction](https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/StringFunctions.cpp#L140), except that `concat_ws`
requires separator and supports using ARRAY<VARCHAR> type and mixed types.

This PR is based on facebookincubator#6292 (author: unigof). There are a few bug fixes
and improvements. Also made some changes to align with Spark.

Doc [link](https://docs.databricks.com/en/sql/language-manual/functions/concat_ws.html).

Pull Request resolved: facebookincubator#8854

Reviewed By: kgpai

Differential Revision: D66898251

Pulled By: bikramSingh91

fbshipit-source-id: 1fcd193a245bea4062c4e20d1e1db9ad6cc3290b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants